U.S. Presidential Inaugural Speechs
What did they say? What are the differences between the Republicans and the Democrats? Is Trump an alien?
By Shiqi Duan
US.Presidents
The inaugural speech is the first official speech of the presidents of the United States. In this project, we apply natural language processing and text mining techniques to explore what they said during their inaugural speech, what ideological values they conveyed, and what kind of emotions they expressed. Intriguingly, we identify several common patterns on their speaking strategies and interesting clusters of their topics.
Once we compare the inaugural speeches of presidents in different parties, we notice some trends on their speeches. If we relate presidents’ topics with the America history, we can obtain more inspiration. Based on the analysis of all the presidents, we treat Trump as our interest to see whether he is alien among U.S. presidents.
Part 1: Sentence Analysis:
First of all, we analyze the length of sentences in the inaugural speeches.
For simpler visualization, we represent each president with the term of inauguration by a unique index according to the order of time and use the index in the following analysis:
indx = data.frame(President.order=order,Index=1:58)
inau_infor = cbind(indx,inau_infor)
indx
## President.order Index
## 1 GeorgeWashington-1 1
## 2 GeorgeWashington-2 2
## 3 JohnAdams-1 3
## 4 ThomasJefferson-1 4
## 5 ThomasJefferson-2 5
## 6 JamesMadison-1 6
## 7 JamesMadison-2 7
## 8 JamesMonroe-1 8
## 9 JamesMonroe-2 9
## 10 JohnQuincyAdams-1 10
## 11 AndrewJackson-1 11
## 12 AndrewJackson-2 12
## 13 MartinvanBuren-1 13
## 14 WilliamHenryHarrison-1 14
## 15 JamesKPolk-1 15
## 16 ZacharyTaylor-1 16
## 17 FranklinPierce-1 17
## 18 JamesBuchanan-1 18
## 19 AbrahamLincoln-1 19
## 20 AbrahamLincoln-2 20
## 21 UlyssesSGrant-1 21
## 22 UlyssesSGrant-2 22
## 23 RutherfordBHayes-1 23
## 24 JamesGarfield-1 24
## 25 GroverCleveland-I-1 25
## 26 BenjaminHarrison-1 26
## 27 GroverCleveland-II-2 27
## 28 WilliamMcKinley-1 28
## 29 WilliamMcKinley-2 29
## 30 TheodoreRoosevelt-1 30
## 31 WilliamHowardTaft-1 31
## 32 WoodrowWilson-1 32
## 33 WoodrowWilson-2 33
## 34 WarrenGHarding-1 34
## 35 CalvinCoolidge-1 35
## 36 HerbertHoover-1 36
## 37 FranklinDRoosevelt-1 37
## 38 FranklinDRoosevelt-2 38
## 39 FranklinDRoosevelt-3 39
## 40 FranklinDRoosevelt-4 40
## 41 HarrySTruman-1 41
## 42 DwightDEisenhower-1 42
## 43 DwightDEisenhower-2 43
## 44 JohnFKennedy-1 44
## 45 LyndonBJohnson-1 45
## 46 RichardNixon-1 46
## 47 RichardNixon-2 47
## 48 JimmyCarter-1 48
## 49 RonaldReagan-1 49
## 50 RonaldReagan-2 50
## 51 GeorgeBush-1 51
## 52 WilliamJClinton-1 52
## 53 WilliamJClinton-2 53
## 54 GeorgeWBush-1 54
## 55 GeorgeWBush-2 55
## 56 BarackObama-1 56
## 57 BarackObama-2 57
## 58 DonaldJTrump-1 58
Overview of sentence length distribution in all inaugural speeches:
sentence.list$Index=factor(sentence.list$Index)
sentence.list$Party=factor(sentence.list$Party)
par(mar=c(2,2,2,2))
# plot word.count of sentences and use different color to represent Parties
beeswarm(word.count~Index,
data=sentence.list,
horizontal = TRUE,
pch=16, col=as.numeric(inau_infor$Party)+1,
cex=0.55, cex.axis=0.8, cex.lab=0.8,
las=2, ylab="President Index", xlab="Number of words in a sentence.",
main="Inaugural Speeches")
legend("topright",legend=levels(inau_infor$Party),fill=2:(length(levels(inau_infor$Party))+1),cex=1)
The beeswarm plot shows that from George Washington to Donald Trump, presidents tend to use shorter sentences in the inaugural speeches.
Reasons:
Easy to Understand
Easy to Remember
When they communicate with the public, they aim to effectively convey their thoughts. Listeners may have forgotten a few words that the speaker said by the end of a long sentence, or have difficulty understanding the core idea due to rambles. Short sentences make more powerful communication. From the beeswarm that there is no difference of sentence lengths among different parties.
We focus more on the latest years and have a look at Donald Trump(58)’s speech:
sel.contemp<-filter(sentence.list,Year>=1981)
sel.contemp$Index<-factor(sel.contemp$Index)
par(mar=c(2,2,2,2))
beeswarm(word.count~Index,
data=sel.contemp,
horizontal = TRUE,
pch=16, col=rainbow(length(levels(sel.contemp$Index))),
cex=0.5, cex.axis=0.8, cex.lab=0.8,
spacing=0.5/nlevels(sel.contemp$Index),
las=2, xlab="Number of words in a sentence.", ylab="President Index",
main="Inaugural Speeches in the Contemporary Era (1980-current)")
111
From the above picture, there is no significant differences between Trump and other presidents in the contemporary era.
WordCloud
As all the presidents may mention several common topics in their inaugural speeches. We weight their speeches by TF-IDF to highlight the specific interest terms for each presidents.
Overview of the most popular words in all inaugural speeches:
# text processing
ff.all<-Corpus(DirSource(folder.path))
ff.all<-tm_map(ff.all, stripWhitespace)
ff.all<-tm_map(ff.all, content_transformer(tolower))
ff.all<-tm_map(ff.all, removeWords, stopwords("english"))
ff.all<-tm_map(ff.all, removeWords, character(0))
ff.all<-tm_map(ff.all, removePunctuation)
dtm.all <- DocumentTermMatrix(ff.all,control = list(weighting = function(x)
weightTfIdf(x, normalize =FALSE),stopwords = TRUE))
ff.dtm.all=tidy(dtm.all)
dtm.overall=summarise(group_by(ff.dtm.all,term),sum(count))
wordcloud(dtm.overall$term, dtm.overall$`sum(count)`,
scale=c(3,0.1),
max.words=100,
min.freq=1,
random.order=FALSE,
rot.per=0.3,
use.r.layout=T,
random.color=FALSE,
colors=brewer.pal(9, "Accent"))
111
There are a few words appearing frequently in inaugural speeches: “America”, “Union”, “Freedom”, “Congress”, “Constitution”, “Revenue”, “Democracy”, and so on. These words show the core values of the U.S. and reflect the challenges U.S. people facing with from old times to today.
The Republican and Democratic are the two major parties in the U.S. these days. We take a look at what the interest words of presidents from these two parties and get some inspiration on the topics these two parties concentrate on.
Interest Words among two major parties:
# Interest Words of Presidents from the Republican Party
sel.repub<-filter(inau_infor,Party=="Republican")$President.order
doc.repub<-paste("inaug",sel.repub,".txt",sep="")
ff.dtm.repub<-filter(ff.dtm.all,document%in%doc.repub)
dtm.repub=summarise(group_by(ff.dtm.repub,term),sum(count))
wordcloud(dtm.repub$term, dtm.repub$`sum(count)`,
scale=c(3,0.2),
max.words=100,
min.freq=1,
random.order=FALSE,
rot.per=0.3,
use.r.layout=T,
random.color=FALSE,
colors=brewer.pal(9, "Accent"))
# Interest Words of Presidents from the Democratic Party
sel.demo<-filter(inau_infor,Party=="Democratic")$President.order
doc.demo<-paste("inaug",sel.demo,".txt",sep="")
ff.dtm.demo<-filter(ff.dtm.all,document%in%doc.demo)
dtm.demo=summarise(group_by(ff.dtm.demo,term),sum(count))
wordcloud(dtm.demo$term, dtm.demo$`sum(count)`,
scale=c(3,0.2),
max.words=100,
min.freq=1,
random.order=FALSE,
rot.per=0.3,
use.r.layout=T,
random.color=FALSE,
colors=brewer.pal(9, "Accent"))
111
From the two wordclod plots, we notice that the Republicans speak only certain words much more frequently than the others: “America”, “Business”, “Freedom”, “Law(s)”, “Enforcement”, and “Congress”.
The Democrats use more words with higher frequency: “Democracy”, “Union”, “America”, “Federal”, “Today”, “Millions”, and so on. It seems that the Democrats have more “most interest” words than the Republicans.
Analysis of Interest words of Trump(58)’s vs. other contemporary presidents:
# Interest Words of Presidents from 1981 except Trump
sel.latest<-filter(inau_infor,Year>=1981&Year<2017)$President.order
doc.latest<-paste("inaug",sel.latest,".txt",sep="")
ff.dtm.latest<-filter(ff.dtm.all,document%in%doc.latest)
dtm.latest=summarise(group_by(ff.dtm.latest,term),sum(count))
wordcloud(dtm.latest$term, dtm.latest$`sum(count)`,
scale=c(3,0.2),
max.words=100,
min.freq=1,
random.order=FALSE,
rot.per=0.3,
use.r.layout=T,
random.color=FALSE,
colors=brewer.pal(9, "Set1"))
# Interest Words of Trump
doc.trump<-paste("inaug",inau_infor$President.order[58],".txt",sep="")
ff.dtm.trump<-filter(ff.dtm.all,document==doc.trump)
dtm.trump=summarise(group_by(ff.dtm.trump,term),sum(count))
wordcloud(dtm.trump$term, dtm.trump$`sum(count)`,
scale=c(3,0.2),
max.words=100,
min.freq=1,
random.order=FALSE,
rot.per=0.3,
use.r.layout=T,
random.color=FALSE,
colors=brewer.pal(9, "Set1"))
The wordcloud of the latest years tells us the challenges during these years, as well as the hot topics in the latest dozens of years: “America”, “Freedom”, “Journey”, “Women”, “Children”, “Century”,etc. It shows us that the presidents emphasize their thoughts on the U.S. values and propose their plans to for the contemporary problems by repeating the core words.
From Trump’s Interest words, we can see he also speaks core words like “America”, and mentions challenges of these days including words: “jobs”, “factories”, and “loyalty”. There is no evidence that he speaks out of the normal inaugural speech style.
I set the topic numbers to be 10. I manually tag them as “Economy”, “Patriotism”, “Trust”, “Liberty”, “Government”, “CountryRelationship”, “Temporal”, “Election”, “People”, and “Future”. Because Topic 1 contains the key words: “Revenue”, “Trade”, and “Tax”, Topic 2 contains “Spirit”, “Pariot”, and “Danger”, Topic 3 contains “Faith”, “Trust”, and “Suppport”,etc.
Based on the most popular terms and the most salient terms for each topic, we assign a hashtag to each topic.
topics.hash=c("Economy", "Patriotism", "Trust", "Liberty", "Government", "CountryRelationship", "Temporal", "Election", "Work&Life", "Future")
corpus.list$ldatopic=as.vector(ldaOut.topics)
corpus.list$ldahash=topics.hash[ldaOut.topics]
colnames(topicProbabilities)=topics.hash
corpus.list.df=cbind(corpus.list, topicProbabilities)
We use heatmap to see the weight allocation of topics for each president: 111
Note that the red color indicates higher weights on that topic. From the heatmap, presidents in same era tends to have similar weight allocation among the topics.
Before 1864, America was in the “Shaping the Nation” Era, presidents emphasized Patriotism, Trust, and Government to unite with all people.
From 1864 to 1939, America was in the “Rise to Power” Era. Presidents cared more about Economy, Election, and CountryRelationship issues.
After 1939, America has been in the “World Leader” Era, presidents pay attention to topics related to Work&life, Temporal, and Future. These days, presidents care more about people’s current daily life and Ameirca’s future.
Now let us look at the topic allocation among the two major parties:
111
From the heatmap, we see that the Republican presidents talk more about Election, Economy, and CountryRelationship, while the Democrats have a wider focus topics, including Liberty, Trust, and so on. The key words match the core thoughts in the two parties.
To gain more sense on the topic allocation of each presidents, we cluster presidents according to the weight allocation of topics:
# generate 3 clusters
set.seed(2)
km.res=kmeans(scale(topic.summary[,-1]), iter.max=200, centers=3)
fviz_cluster(km.res, stand=T, repel= TRUE, data = topic.summary[,-1], show.clust.cent=FALSE)
The plot gives us some confidence that presidents in the same era tend to weight more on the same topics since they are clustered together in the same topic cluster.
In the last part, we will analyze the emotions expressed in the inaugural speeches. Using sentiment analysis, we can get how presidents convey their thoughts via emotional sentences to the public. We will analyze each sentence from the aspects of the eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive).
image
We assign each sentence to the emotion with the largest presence value. The overview of the emotional sentence allocation:
emo.vec<-c("anger", "anticipation","disgust","fear","joy","sadness","surprise", "trust")
# Attitude: [1] means negative, [2] means positive
for(i in 1:nrow(sentence.list)){
sentence.list$Emotion[i]<-emo.vec[which.max(sentence.list[i,12:19])]
sentence.list$Attitude[i]<-which.max(sentence.list[i,20:21])
}
p.all<-ggplot(data=sentence.list,aes(x=Emotion,fill=Index))+
geom_bar(position="dodge")
ggplotly(p.all)
From the plot, we know that presidents tend to use sentences with anger, anticipation, and trust in general. They use fewer sentences with disgust, sadness, and very few with surprise. They want to convey more positive thoughts to the public. Presidents in the early era talked about trust a lot, while contemporary presidents choose more angry sentences.
We are also interest in the emotion allocation for the two major parties:
p.party<-ggplot(data=filter(sentence.list,Party%in%c("Republican","Democratic")),aes(x=Party,fill=Emotion))+
geom_bar(position="dodge")
ggplotly(p.party)
We notice that there is no large difference of emotion allocation in these two major parties.
We use Stack plot to see how Trump delivers his emotion in the inaugural speech:
speech.df=tbl_df(sentence.list)%>%filter(Index==58)%>%select(sent.id, anger:trust)
speech.df=as.matrix(speech.df)
speech.df[,-1]=f.smooth.topic(x=speech.df[,1], y=speech.df[,-1])
plot.stacked(speech.df[,1], speech.df[,2:9],
xlab="Sentences", ylab="Topic share", main="Donald Trump")
## [1] 0.2252233 0.4504465 0.6756698 0.9008930 1.1261163 1.3513395 1.5765628
## [8] 1.8017860
We see that Trump mainly conveys anticipation, joy, and trust in his speech. His sentiment flow fluctuates and rises up to a high level at the end of his inaugural speech.
Let us see how the other contemporary presidents’ emotion flow behaves at their 1st-term inauguration.
## [1] 0.03197138 0.06394276 0.09591414 0.12788552 0.15985691 0.19182829
## [7] 0.22379967 0.25577105
## [1] 0.084496 0.168992 0.253488 0.337984 0.422480 0.506976 0.591472 0.675968
## [1] 0.04338037 0.08676074 0.13014111 0.17352148 0.21690185 0.26028222
## [7] 0.30366259 0.34704296
## [1] 0.03569379 0.07138758 0.10708138 0.14277517 0.17846896 0.21416275
## [7] 0.24985654 0.28555034
## [1] 0.0381787 0.0763574 0.1145361 0.1527148 0.1908935 0.2290722 0.2672509
## [8] 0.3054296
The plots show that every president has his own emotion style.
Reagan and George W. Bush’s emotions are equally weighted along the sentences, though George W. Bush delivers more negative words in fear, sadness, and disgust than the others.
George Bush speaks with flat and smooth emotions but ends up with an emotional sentence emphasizing the anticipation.
Both Clinton and Obama have theri emotions fluctuated during the speech. As Obama is famous for his speech skills, we can learn from the plot that his emotions fluctuates more frequently than others, and his speech is rich in different kinds of emotions. Of course, Obama speaks more positive words, especially in trust and joy.
Comparing Trump’s sentiment stak plot to the above five president’s, we cannot see a big difference in Trump’s speech strategy.
Conclusions
Trump
In general, presidents tend to use shorter sentences in the inaugural speeches to impress people. Their main topics are chosen according to the challenges in different historical eras. Nearly all of them speak positively in their inaugural speeches and emphasize “Trust” heavily, which reveals the nature of the inaugural speech.
For presidents from the two major parties in the U.S., the Republican and the Democratic, there is no significant difference on sentence length and sentiment in their augural speeches. However, they do show difference on the key words and main topics. The Republicans have fewer key words and main topics than the Democrats. The Republicans focus on several topics like “Election” and “Economy”, so speak a lot “Congress” and “Business”. The Democrats have a wider interest topics including “Liberty” and “Trust”, so more keys words than the Republicans.
Is Trump alien? We have compared his speech with the other presidents in the contemporary era, as well as the other presidents from the Republican. We cannot catch a significant deviation. We only find his key words are not quite like the other presidents from the Republican. That may be because he is a president with no political experience before, and lacks of public political speech strategies.
Reference: